Quantifying the effect of temporal resolution on time-varying networks 
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Time-varying networks describe a wide array of systems whose constituents and interactions 
evolve over time. These networks are defined by an ordered stream of interactions between nodes. 
However, they are often represented as a sequence of static networks, resulting from aggregating all 
edges and nodes appearing at time intervals of size At. In this work we investigate the consequences 
of this procedure. In particular, we address the impact of an arbitrary At on the description 
of a dynamic process taking place upon a time-varying network. We focus on the elementary 
random walk, and put forth a mathematical framework that provides exact results in the context of 
synthetic activity-driven networks. Remarkably, the equations turn out to also describe accurately 
the behavior observed on real datasets. Overall, our analytical description of the bias introduced by 
time integrating techniques represents a step forward in the correct characterization of dynamical 
processes on time-varying graphs. 

Time-varying networks are ubiquitous. Examples are found in the social, cognitive, technological and ecological 
domains as well as in many others £Q. The temporal nature of such systems has a deep influence on dynamical 
processes occurring on top of them [2HT9]. Indeed, the spreading of sexual transmitted diseases, the diffusion of topics 
over social networks, and the propagation of ideas in scientific environments are affected by duration, sequence and 
concurrency of contacts [3J [H IPTl - ET] . In all these cases the timescale characterizing the evolution of the network is 
comparable with the timescale ruling the unfolding of the process, and they cannot be decoupled. However, empirical 
datasets are often reduced to a series of static networks by introducing a time-integrating window, At [TJ l^2"H2"S] . 
Dynamical processes are then let evolve in the sequence of T/ At networks, where T is the total time span available. 
While this technique might be useful to gain different levels of insight into the dynamics of these processes, it might 
introduce strong biases in their characterization [TH31 03 El HE] ■ 

Beyond practical reasons, the interplay of timescales in the characterization of processes evolving on time-varying 
networks is a deep and general problem. The minimum time resolution achievable when measuring a network in 
the wild is likely to be finite, and a given At may be intrinsic to the description at hand. The latter is the case, 
for instance, of face-to-face interaction networks [26], for which the fine-grained temporal resolution of (e.g.) phone 
call networks is not available. Similarly, an intrinsic minimum time resolution might exist. Scientific reviews, for 
example, "aggregate" articles with the periodicity of their publication (e.g. weekly or monthly), which then becomes 
the timescale of the citation or co-authorship networks [2"TH2"5] . 

In this paper, we analyze the role played by an arbitrary At. We investigate in detail how the behavior of a 
dynamical process depends on the time aggregation window of the underlying time-varying graph. We focus our 
attention on the elementary random walk process (RW) and address explicitly the role of At in the behavior of the 
walker. Adopting the theoretical framework of activity-driven networks [T7] (see Methods), we provide an analytical 
characterization of the RW asymptotic occupation probability as a function of At. The proposed mathematical 
framework yields a clear understanding of the effects introduced by time-aggregating techniques on the diffusion 
process, accurately describing the biases and distortions introduced by aggregation procedures. We explicitly connect 
our results to the well-known RW occupation probability on static networks and extensively prove our analytical 
results against numerical simulations on synthetic networks. We then extend our validation by considering a set of 
distinct real time- varying networks. Remarkably, also in this case the observed effects introduced by time aggregation 
are well described by our analytical results, which suggests their validity in a wide class of time-varying networks. 

Results 

We consider a random walker diffusing at discrete time steps At over a time-varying network characterized by N 
nodes[42]. Starting at node V(t) at step t, the walker takes step t + 1 at time (t + l)At diffusing over a network G t (Ai), 
where Gt(At) is the result of the union of all the edges generated in the interval [tAt, (t+1) At) (see Figure[T]). We refer 
to At as the integrating time window of the network. In the limit At — > the RW process and the network evolve on 
the same timescale, with the walker moving as soon as an edge appears. This limit has been studied analytically |18j 
in the framework of activity-driven networks [T7] , where each node is characterized by an activity rate describing the 
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average edge creation rate of a node in the system (see Methods). Here, we address the general case At > 0, which - 
as it will soon be clear - produce dramatically differently walker behaviors than the At — > special case. 

Activity-driven networks are a class of time- varying graphs characterized by two parameters (see Methods for further 
details): to, the number of edges that are simultaneously created by a node, and dF(a)[I3], the fraction of nodes with 
activity rate a. The activity rate determines the probability per unit time of a node establishing (to, simultaneously) 
edges to other nodes in the system. The value of parameter to is dictated by the specific system under consideration. 
The case to > 1 is appropriate to describe one-to-many interactions, found for example in such systems as Twitter 
and blog networks [301131]. On the other hand, m = 1 describes two-party communications that are characteristic 
of phone-call and text-message networks [3H As we will see, the latter case is particularly relevant, and in the 
following sections we will refer to this type of systems as time-varying dyadic networks. 



Analytic expression for arbitrary integrating windows. 

Let us define Q a \ a /(At) as the transition probability that a walker starting at a node with activity a' moves to a 
node with activity a at the next step. To find an expression for Q a i a i(At), note that at step t + 1 the neighbors of 
V(t) can be classified into two types: 

1. Passive destinations, are neighbors of V(t) connected by edges created due to the activity of V(t) itself. The end- 
points away from V(t) are randomly sampled from the graph and thus their activity comes from the distribution 
dF(a). We define mK^t,A(t) to be the number of such passive destinations. 

2. Active destinations, are neighbors of V(t) connected to V(t) by edges created due to their own activity. The 
cndpoint away from V(t) is biased towards high-activity nodes. More precisely, the activity distribution of such 
node is adF(a)/(a), where (a) is the average activity rate. We define define H/\t as the number of such active 
destinations. 

The word destinations highlights the fact that the walker moves from V(t) to one of these to K^t.a' + HAt nodes. For 
sufficiently large N, H& t is Poisson distributed with average m(a)At and K^t.a' is Poisson distributed with average 
a' At. If V(t) has at least one edge, the walker chooses to follow the edge of a passive destination with probability 
to K At a ' I '{rn K At,a' + HAt)- Conversely, the walker follows an edge towards an active destination with probability 
H&t/ {m K& t ,a' + H& t )- Unconditioning the latter expression with respect to the values of K^ t ,a' and H\ t we obtain 
for all values of a and a' 

Q a]a , (At) = linwo EZo EZo (s^dF(a) + ^ + _^_ (5(a _ a ^ (1) 

x exp(-a'Ai) x (^iMMl exp(-m(a)At) , 

where 5(a! — a) is the Dirac delta function which is one if a' = a and zero otherwise, and e — > is an auxiliary 
variable used to avoid treating special cases that lead to an undefined 0/0 separately from the main equation. While 
we refer the reader to the Supplementary Information (SI) for the detailed derivation, each term in Eq. ([I]) has a 
simple interpretation. The first term inside the double sum is the probability that the walker moves to a passive 
destination that has activity a; the second term inside the double sum is the probability that the walker moves to an 
active destination that has activity a; the third term inside the double sum is the probability that the node has no 
edges after time At and thus the walker remains at V(t), hence not changing activity; the first term at the second line 
gives the probability that if At, a' = k implying that km passive nodes are connected to V(t); and finally the second 
term at the second line gives the probability that h active nodes are connected to V(t). 

The most interesting special case of Eq. ([I]) concerns time- varying dyadic networks (to = 1). In these networks 
Eq. is greatly simplified (see SI): 

Q a \a>(At) = _^±^.dF(o)(l - Ca'.At) + S(a' - a)Ca>,M , (2) 
a' + (a) 

where Ca',At = e ~^ a + ^ a ^ At is the probability that no edge is created at a node with activity a' during interval At. 

To find the RW stationary distribution we first note that the RW on the time-varying network is stationary and 
ergodic (see SI). Thus, the RW occupation probability p a , defined as the probability of finding the walker in a given 
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node of activity a, exists and is unique [33]. The value of p a is the fixed point solution of the following Chapman- 
Kolmorogov set of equations [35] 

Pa = Ar * , / Q a \ a ,(At)p a ,dF(a') , Vaefi. (3) 



iVdF(a) 



a'efl 



The solution to Eq. ([3| can be easily obtained by numerical methods. However, for At — > the equation admits 
a simple solution that reproduces the results in Perra et al.JTS] (see SI). Next we see that in time- varying dyadic 
networks At 1 and At 3> 1 are also special cases that admit closed-form solutions. 

Static networks widely used in the research community are often the result of aggregating time- varying networks over 
large aggregating windows, At ^> 1 [I]. We connect our approach to static network results by contrasting the walker 
occupation probability p a of a time- varying dyadic network (Eq. ([3])) against the static network occupation probability 
^static Q f a gj ven static snapshot Gt(At) of the same network. On static networks the occupation probability p|^ a ^ ic 
is the probability the walker is in a given node with degree k, which is proportional to the node degree, i.e., ^static ^ 
k [35] [37]. In the limit where the network and the integrating windows are large Eq. (|3| simplifies to (see SI) 



In order to contrast piratic against p a we exploit the proportionality between the degree of a node v, k v , and 
its activity, a v . In the regime of large At, k v tx a v |17j . Combining this result with our derivation in Eq. Q yields 

Pa v oc k v , from which we conclude that piratic cx p av for any node v over any static snapshot Gt(At), t = 0, 1, This 

is an important result that clearly shows how our theory reduces the well-known behavior found in static networks 
for sufficiently large At. In our SI we also show that Eq. Q holds for a broader range of time-varying aggregations 
in which integrated edges have weights proportional to their level of activity. 

A less intuitive result is found in the regime of very short aggregating windows, At <C 1. In time- varying dyadic 
networks when At <C 1, Eq. ^ simplifies to (see SI) 

Pa « Jj ■ (5) 

Thus, the walker is equally likely to be found at any node regardless of its activity rate. This might appear coun- 
terintuitive at first, as one would expect more active nodes to be attractors to the random walker. However, when 
At is small the network is characterized just by dyads and each node has degree either zero or one. Consequently 
highly active nodes lose and gain walkers at the same rate, giving rise to homogeneous occupation probabilities in 
Eq. ([5]). Moreover, Eq. ([5| holds even over families of time- varying networks whose aggregated snapshots are strongly 
correlated (see SI). 



The case of bipartite network projections. 

Time- varying networks can also be bipartite in nature, typically with one class of nodes representing the actors of a 
system and the other class representing the groups or objects they interact with [3TH35] . Examples are the networks 
formed by scientific authors and the articles they co-author, listeners and songs, and consumers and products (books, 
smartphones, etc.). Time-varying edges can only be created between nodes of different classes, but the relationship 
between actors can be obtained from a simple projection, where all agents connected to the same object form a clique 
in the network. Similarly, the relationship among objects can also be obtained. Henceforth we denote these networks 
as time-varying projected bipartite networks. Note that unlike time-varying dyadic networks, in time-varying projected 
bipartite networks nodes form instantaneous cliques of any size. 

Interestingly, the walker over such time- varying projected bipartite network turns out to behave much like a walker 

over a time-varying dyadic network. Let p^ denote the RW occupation probability on the time-varying projected 

bipartite network. If At — > it is possible to show that p^ = l/iV03j (see SI for complete derivation). This is 
because the occupation probability is shared among the nodes in the cliques created by the bipartite projections [34], 
thus resulting in a homogenous distribution of p a . Interestingly, as we will sec in the next sections, our experimental 

results also indicate p]^ 3 oc pa^^ across a range of values of At > 0, once we adjust for the fact that a time-varying 
projected bipartite network creates more edges than a time- varying dyadic network with the same activity distribution. 
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A precise mathematical formulation that derives this relationship between projected bipartite and dyadic networks 
for arbitrary At > remains an open problem due to the combinatorial difficulties that arise from the growth of 
clique sizes as At gets larger. 

Numerical simulations on synthetic networks. 

We support our analytical results with extensive Monte Carlo simulations of the RW process with various activity- 
driven network parameters. We study networks characterized by N = 10 5 nodes and a power law activity distribution 
dF(a) oc a -7 (as observed in many real networks [T7]). We restricted the activity to the interval fi = [10 -3 , 1] in 
order to avoid possible divergencies in the limit a <C 1. As shown in Figure [2} A, the exact solution reproduces the 
simulations with great accuracy. Note a one-order magnitude increase in At (e.g. from At = 1 to At = 10) is enough 
to elicit a sharp increase in the occupation probability at high-activity nodes. This increase, however, is met with 
a slight occupation probability reduction at low activity nodes. Also note that as At increases p a oc a approaches a 
straight line as predicted by Eq. Q. Similarly, as At gets smaller, p a = l/N, as predicted by Eq. 

We also investigate the relationship between p a and the number of simultaneous connections m. In Figure [2]F3 we 
show the results using the same parameters as before but changing the value of m from one to six. The increase 
in p a at high activity nodes is much smoother than in the previous case m = 1. Low (high) activity nodes (whose 
activity rates are smaller (larger) than the average (a)) also have lower (higher) occupation probability at m = 6 than 
at m = 1. This behavior is puzzling as, by increasing m, we are increasing in equal proportions both the average 
number of passive and active destinations of all nodes, which (at least in average) would mean no change in occupation 
probabilities. However, the overall activity on the network increases with m. Thus, walkers at low activity nodes 
move more quickly, which decreases (increases) the occupation probability at low (high) activity nodes. 

Integrating window effects in real datasets. 

We also study the impact of integrating windows on the stationary distribution of a RW over two datasets and 
compare the results with the predictions of our theory. We consider two empirical time-varying (projected) bipartite 
networks (see Methods for the details). The first is a time- varying co-authorship network of the Physical Review 
Letters (PRL) journal from 1980 to 2006 [SD] . The second consists in the Yahoo! music dataset [JT] with approximately 
4.6 x 10 5 songs rated by approximately 2 x 10 4 Yahoo! users, observed over the course of six months |U] (see Methods 
for the details) . Our experiment consists in running a RW process over these two time- varying networks with different 
integrating windows At and recording the empirical walker occupation probability over multiple runs. 

Strikingly, our theoretical predictions match the empirical behavior of the RW process over these real time-varying 
networks remarkably well. In the case of PRL, we integrate over four distinct values of At = {1, 10,60, 182} days. 
The solid points in Figure [3] show the empirical values of p a observed in this dataset. These results are exact (see SI). 
The standard deviation obtained from simulation runs with distinct starting days are shown as error bars. The solid 
lines are the numerical solution of Eq. where Q a \ a i{At) is as described in Eq. ([2| (that is, we model the network 
as a time- varying dyadic network), with At as a rescaling parameter (see SI). All numerical solutions use the same 
activity distribution dF(a), extracted from the time-varying graph of At = 1 day. We note in passing that dF(a) 
extracted from larger values of At are roughly identical to the one at At = 1, as shown by Perra et al.[T7] (see SI for 
details). Figure [3] shows great agreement between the theoretic results and the simulations in real data. Moreover, 
for small At = 1 the RW occupation probability is uniform and independent of node activity, as predicted by Eq. ([5| . 
Interestingly, for m > 1 did not observe good agreement with the theory while for m = 1 the data matches well the 
predicted theoretical behavior, showing an interesting connection between projected bipartite and dyadic time- varying 
networks. 

In the Yahoo! song ratings time-varying network, we use four distinct values of At, namely one second, one hour, 
six hours, and one day. The RW occupation probability p a is shown in Figure [4] as solid points. All numerical 
solutions use the same activity distribution dF(a), extracted from the time- varying graph of At of one second. Here, 
as in the PRL experiment, the results are mostly insensitive to the value of At chosen to extract dF(a) (see SI for 
details). Figure [4] shows that the theoretical results match the real data well, with some deviations for nodes in the 
intermediate activity range at At of one day. As predicted by Eq. Q , as At increases the RW occupation probability 
p a approaches a straight line, and this effect is most prominent at high-activity nodes. Moreover, as in the PRL 
network, at the smallest value of At (one second) the RW occupation probability is uniform and independent of node 
activity, as predicted by Eq. |5]). 
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Discussion and conclusion 

For practical or technical reasons researchers are often forced, or simply tempted, to work with time aggregated 
representations of time- varying networks. However, such aggregation may impact the behavior of dynamical processes 
taking place on top of these networks. Motivated by this observation, we have investigated the role played by time 
aggregation windows on the behavior of random walks, arguably the most widely studied diffusion process. 

Our results demonstrate that time aggregation procedures do have a significant impact in the characterization of 
the dynamical process, even when aggregation windows are "relatively short." We have quantified this effect in a 
rigorous mathematical framework that (i) allows us to recover the results concerning static networks in the limit 
of infinite aggregation windows, and (ii) accurately describes the behavior observed in numerical simulations upon 
synthetic time-varying networks. Moreover, testing our predictions against real datasets we have shown that our 
model captures well the observed phenomenology not only qualitatively but also quantitatively. 

Overall, our work suggests that caution should be used when drawing general conclusions about dynamical processes 
on time-varying graphs extrapolated from their study on time-aggregated networks. At the same time, however, our 
theoretical results may help to investigate possible distortions introduced by the aggregating windows of data collection 
methods. 

Methods 
Activity-driven networks. 

Activity-driven network models are based on the activity patterns of nodes, that are used to explicitly model the 
evolution of the network structure over time [17] . Each node i is characterized by an activity rate a%, sampled from 
distribution dF(a). At each step t = 0, 1, . . . the network Gt(At) is generated as follows: 

a) G t (At) starts with N disconnected nodes; 

b) The the number of times a node with activity a is active during interval At, K^ t ^ a , is Poisson distributed 

P[#At,« = fc] = ^|^exp(-aAi). 

Node generates mK& t a undirected edges connected to mK/± t a randomly selected nodes (without replacement 
or self- loops). Parameter m represent the average number of instantaneous connections established by each 
active node in the system. Inactive nodes in this observed period of At may receive connections from other 
active vertices; 

c) At time (t + I) At the process starts over from step a) to generate network Gt+i(At). 

It can be shown that the full dynamics of the network are encoded in the activity rate distribution, dF(a) and 
that the time-aggregated measurement of network connectivity yields a degree distribution that follows the same 
functional form as the distribution dF(a) in the limit of small k/At and k/N [17] . This is an important feature of the 
model, that is able to reproduce basic statistical properties found in many real networks giving a simple prescription 
to characterize explicitly dynamical connectivity patterns. 



Datasets & Simulation. 

In this study we considered two different empirical projections of bipartite time-varying networks. The collabora- 
tions in the journal Physical Review Letters (PRL) published by the American Physical Society [ID], and the Yahoo! 
music dataset made available by Yahoo! [41) . 

PRL dataset. The bipartite network representation of this dataset has two type of nodes: authors and papers. An 
author is connected to all the papers she/he wrote in a integrating window At. We study the bipartite projection of 
the authors. In this representation each author of an article in PRL as a node. Undirected edges connect authors 
that collaborate in the same article. We focus just on small collaborations filtering out all the articles with more 
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than 10 authors. We consider the period between 1958 and 2006. The datasets contains 80,554 authors and 66,892 
articles. The smallest timescale available is one day. 

Yahoo! music dataset. In this dataset the bipartite network has two type of nodes: users and songs. We 
study the bipartite projection over the songs. Each node is a song and two songs are connected if at least one 
user rated both in a time window At. The dataset contains 4.6 x 10 5 songs rated by 199,719 users of Yahoo! users 
collected in the course of six months [41 . User activity is recorded at a time resolution of seconds. 
Simulation setup. We obtain the empirical walker occupation probability, p a , as follows. Construct the transition 
probability matrix P t associated to the RW on the t-th. aggregated network G t (Ai), t = 0, . . . , [T/At\ , where T is 
the time of the last event in the dataset. The empirical RW occupation probability is obtained by multiplying the 
matrices PqPi - ■ ■ P n and then left-multiplying the result by the vector (1/N, . . . ,1/N), which gives equal probability 
that for the walker to start at any node. We note in passing that similar results are obtained when the walker starts 
at a handful of high activity nodes. 
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FIG. 1: Example of time integration on time-varying networks. The random walker is located at the red node and At defines 
integration window. 
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FIG. 2: Occupation probability p a of a RW over an activity-driven network with activity distribution dF(a) oc a~ 2 , a £ (10 -3 , 1), 
TV = 10 , for different values of m. In panel A) we plot the results for m — 1. In panel B) instead, m = 6. Solid lines represent 
the analytical prediction Eq. |3| integrated over At — 1, 10, 100 (diamonds, squares and circles) time windows. Note that in 
both panels as At gets larger p a ~ a. Averages performed over 10 3 independent simulations. 
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FIG. 3: Occupation probability p a of a RW at the end of the simulation as a function of node activity. The points are the 
values of p a on the Physics Review Letters time-varying co-authorship network from 1980 to 2006 for different integrating 
windows At £ {1, 10,60, 182} days. The solid lines are the numerical solution of Eq. |3j. The error bars are evaluated starting 
the process at different starting points. 
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FIG. 4: Occupation probability p a of a RW at the end of the simulation as a function of node activity. The points are the 
values of p a on the time-varying graph of Yahoo! song ratings for different integrating windows At of one second, one hour, six 
hours, and one day. The solid lines are the numerical solution of Eq. |3b. The standard deviations are too small to be shown 
in the plots. 



